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Abstract — We present novel techniques for analyzing the 
problem of low-rank matrix recovery. The methods are both 
considerably simpler and more general than previous approaches. 
It is shown that an unknown n x n matrix of rank r can 
be efficiently reconstructed from only Oinrv In 2 n) randomly 
sampled expansion coefficients with respect to any given matrix 
basis. The number v quantifies the "degree of incoherence" 
between the unknown matrix and the basis. Existing work 
concentrated mostly on the problem of "matrix completion" 
where one aims to recover a low-rank matrix from randomly 
selected matrix elements. Our result covers this situation as a 
special case. The proof consists of a series of relatively elementary 
steps, which stands in contrast to the highly involved methods 
previously employed to obtain comparable results. In cases where 
bounds had been known before, our estimates are slightly tighter. 
We discuss operator bases which are incoherent to all low- 
rank matrices simultaneously. For these bases, we show that 
0{nrv\an) randomly sampled expansion coefficients suffice to 
recover any low-rank matrix with high probability. The latter 
bound is tight up to multiplicative constants. 

Index Terms — Matrix completion, matrix recovery, compressed 
sensing, operator large-deviation bound, quantum-state tomog- 
raphy 



I. Introduction 

We consider the problem of efficiently recovering a low- 
rank matrix from a small number of expansion coefficients 
with respect to some basis in the space of matrices. Related 
questions have recently enjoyed a substantial amount of atten- 
tion (c.f. HI, El, El, H, 0, @, for a highly incomplete 
list of references). 

To get some intuition for the problem, note that one needs 
roughly rn parameters to specify an n x n-matrix p of rank r. 
Therefore, it might be surmised that about the same number 
of expansion coefficients of p (with respect to some fixed 
matrix basis) are sufficient to uniquely specify p within the 
set of low-rank matrices. It is by far less clear whether p 
can be recovered from this limited set of coefficients in a 
computationally tractable way. 

Low-rank matrix recovery may be compared to a technique 
studied under the name of compressed sensing JS), J9j, [10|. 
In its simplest version, the task there is to recover a sparse 
vector from few Fourier coefficients. Informally, the property 
of having a low rank is the "non-commutative analogue" of 
sparsity. In this sense, one may think of the matrix recovery 
problem as a non-commutative version of compressed sensing. 
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This field of research was started in earnest with the results 
in El, 131- There, it was shown that surprisingly, reconstruct- 
ing a rank-r matrix from only 0(nr polylog(n)) randomly 
selected matrix elements can be done efficiently employing 
a simple convex optimization algorithm. These findings were 
partly inspired by methods used earlier in compressed sensing 

anna. 

The results presented in El, El were as spectacular as they 
were difficult to prove; the tighter bounds in [3| required 
dozens of pages. At the same time, the proof techniques 
seemed to be tailored to the fact that matrix elements, as 
opposed to more general expansion coefficients, had been 
sampled. 

In ifTTl the present author and collaborators developed new 
methods for analyzing low-rank matrix recovery problems. 
The work was motivated by the desire to prove analogues of 
[2 1, [3 1 applicable to certain problems in quantum mechanics. 
Three main improvements were achieved. Most importantly, 
the mathematical effort for obtaining near-optimal bounds on 
the number of coefficients needed to determine a low-rank 
matrix was cut dramatically, with a condensed (but complete) 
version of the proof fitting on a single page. Also, the new 
arguments depend much less on the specific properties of the 
basis used. Lastly, in some situations, the bounds obtained are 
tighter than those presented previously. In some cases, the gap 
between lower and upper bounds is reduced to a multiplicative 
constant. 

The present paper builds on the methods of IfTTl . It aims 
to make them accessible to readers not accustomed to the 
language of quantum information theory, supplies many details 
missing in [ 1 1 1 due to space limitations, generalizes the results 
to arbitrary operator bases, and provides tighter estimates. 

A. Setting and main results 

Throughout the main part of this paper the word "matrix" 
will be used to mean "Hermitian matrix" (or, equivalently, 
"symmetric matrix", if one prefers to work over the real 
numbers). Our methods work more naturally in this setting, 
and a lack of Hermiticity would just be a technical problem 
obscuring the essence of the argument. In fact little generality 
is lost. In Section IIII-DI we describe a straight-forward way 
for translating any non-Hermitian matrix recovery problem to 
a Hermitian one. Therefore, in essence, all our results include 
this more general case. 

The unknown rank-r matrix to be recovered will be denoted 
by p. On the space of Hermitian matrices, we use the Hilbert- 
Schmidt inner product (cri,<72) = t^erta^)- We assume that 
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some ortho-normal basis {w a }™ =1 with respect to this inner 



2 



product has been chosen (referred to as an operator basis). 
Thus, p can be expanded as 

n 2 

P = ^2(Wa,p) W a . 
a=l 

The question addressed below is: given that rank p < r, 
how many randomly chosen coefficients (w a , p) do we need 
to know, before we can efficiently reconstruct p? 

In order to perform the reconstruction, we will utilize the 
algorithm employed in g), 0, (2), 0. Let Q C [l,n 2 ] be a 
random set of size m. Assume that we know the coefficients 
(w a ,p) for all a g fl The algorithm simply consists of 
performing the following (efficiently implementable) convex 
optimization over the space of matrices: 

min 1 1 o" 1 1 1 ( 1 ) 

subject to (<T,w a ) = (p,w a ), Va G O. 

Above, ||<t||i is the trace-norm (also Schatten 1-norm or 
nuclear norm), i.e. the sum of the singular values of a. Let 
a* be a solution of the optimization. Theorem [3] quantifies the 
probability (with respect to the sampling process) of er* being 
unique and equal to p, as a function of the the number m of 
coefficients revealed. 

It is clear that the algorithm will perform poorly if p has 
very few non-zero expansion coefficients with respect to the 
basis {w a } J2). To avoid such a situation, we must ensure 
that a typical coefficient will contain "enough non-trivial 
information" about p. That is the content of the various notions 
of "incoherence" which have been proposed J2], 0. Our 
definition of incoherence is stated below. It is closely related 
to, but more general than, the parameter p used in 0, 0. In 
particular, going beyond previously published situations, we 
find that there are certain bases with the property that any 
low-rank matrix is incoherent with respect to them. 

To state the results more precisely, we need to introduce 
some notation. (We try to follow J2| as closely as possible). 
Let U = range p be the row space of p (which is equal to its 
column space, due to Hermiticity). Let Pjj be the orthogonal 
projection onto U. The space of matrices 



T = {a\(l-Pu)a(l-Pu) = 0} 



(2) 



whose compression to kerp vanishes will play an important 
role (1 is the identity matrix; see also Fig. [2). The map 

Vt : o- (-)• Pjj a + aP v - P v aP v . 

projects^ onto T. Whenever there is little danger of confusion, 
we will not make the dependency of T, Vt and other objects 
on p explicit in our notation. 

Recall the definition of the sign function: sgn(a;) = x/\x\ 
for i / and sgn(O) = 0. Below, we will apply the sign 
function (and other real functions) to Hermitian matrices. 
Expressions like sgn a are to be understood in terms of the 
usual "functional calculus". I.e. sgn a is the matrix which is 
diagonal in the same basis as er, but with eigenvalues sgn(Aj), 
where the \ are the eigenvalues of a. 

1 We will use calligraphic 7-"s for matrix-valued projections, and roman 
P's for vector-valued projections. 



The unadorned norm \\a\\ of a matrix a refers to the 
operator norm (or spectral norm): the largest singular value. 
The 2-norm (also Frobenius norm) is ||ct||2 = tr(<T(r) 1 / 2 . 

We can now state our definition of coherence. 

Definition 1 (Coherence). The n x n-matrix p has coherence 
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v with respect to an operator basis {iu a }"=i if either 

1 



or the two estimates 



max || "Pt^ || 5 

a 

max(w a ,sgnp) 2 



< v- 



< 2i/-, 
n 
r 



(3) 

(4) 
(5) 



hold. 



Let {ei,...,e„} be the standard basis in <D™. The (non- 
Hermitian) standard operator basis is {ejejj-f j =1 , where e^ej 
is the matrix whose only non-zero element is a 1 at the 
intersection of the «th row and the jth column. The best 
previously known result seems to be this: 

Theorem 2 (0 Thm. 1.1]). Let p be a rank-r matrix with 
coherence v with respect to the standard operator basis. Let 
Oc [1, n 2 ] be a random set of size |f2| > Oijirv^ In 2 n). Then 
the solution a* of the optimization problem (Q is unique and 
equal to p with probability at least 1 
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Our main theorem works for arbitrary operator bases, im- 
proves the ^-dependency and will turn out to be easier to 
prove. 

Theorem 3 (Main result). Let p be a rank-r matrix with 

2 

coherence v with respect to an operator basis {w a /a=i- Let 
C [1, n 2 ] be a random set of size \Cl\ > 0(nrv (l+^)ln n). 
Then the solution cr* of the optimization problem (0) is unique 
and equal to p with probability at least 1 — 

The precise condition on |Q| for the statement in Theorem[3] 
to hold is 

> log 2 (2n 2 v / r)64i/(ln(4n 2 ) +ln(91og 2 n) + ^lnn)rn. 

(No attempt has been made to optimize the constants appearing 
in this expression.) In the expositional part of this paper, we 
will frequently employ the "big-Oh"-notatior@ to give simpli- 
fied accounts of otherwise complex expressions. However, in 
the more technical sections, it will be shown that all statements 
hold for any finite n (and not just asymptotically, as the O- 
notation might suggest) and all constants will be worked out 
explicitly. 

We remark that the only property of the basis {w a } itself 
that has entered the discussion so far is its operator norm 
max a || u> Q || . Intuitively, the reason is easily understood: ma- 
trices with small operator norm are "incoherent" to all low- 
rank matrices simultaneously. More precisely: if p is a matrix 
of rank r, normalized such that ||p||2 = 1, then Holder's 
inequality for matrices lfl2l Corollary IV.2.6] gives the estimate 

\(w,p)\ 2 < \\w 



IPIIi 



I w|| 2 r 



(6) 



2 We write | £~2 1 > 0(f(n, r, v, /3)) if there is a constant C such that for n 
large enough and for all u, (3 and r < n, it holds that |f2[ > Cf(n, r, u, /3). 
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for any matrix w. Hence the squared overlap on the left hand 
side is small if both r and ||iu|| are. As a corollary, we can 
actually derive from ([3). Indeed 



\V T w a \\l 



< 



SUp (w ai <y) 2 < \\w a \\ 
<reT,||ff||a=l 

||u;J| 2 2r|M|2 < 2u~ 



(having used the simple fact that max a& T (rank cr) = 2r). 

Equation © has a well-known analogue in compressed 
sensing (8), (9), IflOl . There, one uses the fact that "vectors 
with small entries" are incoherent to "sparse vectors". Indeed, 
if o"i,02 are vectors, 1 1 cxx 1 1 is taken to be the supremum 
norm (i.e. the absolute value of the largest component of cti) 
and rank (72 is the number of non-zero entries of a%, then 
Eq. © remains true. The best-known example of a basis 
consisting of vectors with small supremum norm is the Fourier 
basis. Motivated by this analogy, we will refer to operator 
bases fulfilling ® as Fourier-type bases. Arguably, from a 
mathematical point of view, they form the most natural setting 
for low-rank matrix recover^ 

We will prove Theorem [3] for Fourier-type bases first and 
then present two relatively simple modifications which allow 
us to cover the general case. 

In later sections we will refine the analysis for Fourier-type 
bases, arriving at Theorem [4] Asymptotically, the estimate is 
tight up to multiplicative constants. 

Theorem 4 (Tighter bounds for Fourier-type bases). Let p be 
a rank-r matrix and suppose that {w a } is an operator basis 
fulfilling max Q ||w a || 2 < z.- Let ft C [l,n 2 ] be a random set. 
Then the solution a* to the optimization problem (fJJ is unique 
and equal to p with probability of failure smaller than e~°, 
provided that 

|fi| > 0{nrv{fi + l)lnn). 

Comparable bounds were known before in situations where 
the operator basis itself was drawn randomly (as opposed to a 
random subset from any given basis) [ 1 ] or under additional 
assumptions on the spectrum of p (6l- However, this seems to 
be the first time the optimal log-factor in the bound on |Q| has 
been proven to be achievable in a matrix recovery problem, 
where the involved basis and unknown matrix were neither 
randomized nor subject to constraints beyond their rank. 

B. Examples 

1) Matrix completion: We apply Theorem [3] to the special 
case of matrix completion, as treated in J2j , (3), J6). Denote 
the standard basis in (D™ by {ei}" =1 and let {eiej}™.,- be the 
standard operator basis. Set U = range p and let Pjj be the 
orthogonal projection onto U . Assume that p fulfills 



maxllPj/eilH < p x - 



n 



max 

i,3 



{ei,sgapej 



< 



/'2 



3 To the best knowledge of the author, the first researcher who clearly 
appreciated the significance of the basis' operator norm was Y.-K. Liu. He 
proved that some of the bounds in |2| continue to hold for all low-rank 
matrices, if - instead of matrix elements - one samples expansion coefficients 
with respect to a certain unitary operator basis 1131 . 



(angle brackets refer to the standard inner product in (D™). 

Because we work in the setting of Hermitian matrices, it 
holds that 



{ei.pej 



so that every time one matrix element is revealed, we addition- 
ally obtain knowledge of the transposed one. Accordingly, the 
Hermitian analogue of sampling matrix elements is sampling 
expansion coefficients with respect to the basis {w a } of 
matrices of the form 



1 



2(e i e! + e J -e i t ), i/VS^e} - e^ef) 



(7) 



for i < j, together with the matrices e^ej supported on the 
main diagonal. 

One now simply verifies 

max || :P T u> a || 2 < 2^i-, max (w a , sgnp) 2 < 2^ 2 -^. 

Thus, Theorem[3]is applicable with v = max{/ii, 2/i 2 ,}. 

2) Unitary operator bases: We briefly comment on bases 
with minimal operator norm. Let {w a } be an ortho-normal 
basis in the space of matrices. At this point, we do not assume 
that the basis is Hermitian. Denote the singular values of w a 
by s i{u! a )- Since 

n 

l = \\w a \\ 2 2 =Y / (si(™ a )) 2 , 

i=l 

it follows that ||w a || 2 = maxi sf(w a ) > —. Therefore v = 1 is 
the best possible value in (|3)- It is achieved exactly if \/nw a 
is unitary for every a € [l,Ji 2 ]. Such unitary operator bases 
have been studied in some detail (see e.g. |14|). 

A standard example with manifold applications is the Pauli 
(operator) basis. For n = 2 it is given by w a — ^o"o> where 



0"! 



CT3 



1 

1 

1 

-1 



CT 2 



04 



-i 

1 

1 

1 



are the Pauli matrices. The <7j's have eigenvalues {±1} and 
are thus both unitary and Hermitian. The Pauli basis {wi k ^} 
for matrices acting on (<D 2 )® fe ~ <D 2 is defined as the fc-fold 
tensor product basis with factors the {w^} above. 

The bases {wi k ^} possess an exceedingly rich structure 
which is at the heart of many central results in quantum 
information theory (see e.g. 03], fl6l . ifTTl : for a brief 
introduction see [ 18 1). We will make use of the existing theory 
to prove lower bounds on |Q| in Section UlI-CI 

The Pauli basis is a commonly used ingredient in ex- 
perimental quantum-state tomography — a fact which initially 
motivated this work. 

C. Intuition 

The basic intuition underlying our results differs little from 
previous approaches 0, O, (TJ, J4). For the sake of being 
self-contained, we still give a brief non-technical account 
of some aspects we find essential. (Technical differences to 
existing publications are outlined in the next section.) 
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Fig. I. (a) The unknown matrix p is an element of a n 2 -dimensional linear 
space. The axis labeled f2 represents all the coordinates of p known to us. We 
have no information about the projection of p onto the orthogonal directions, 
represented by the axes labeled (I , Thus the set of matrices compatible 
with the coefficients known to us forms an affine space A, parts of which 
are indicated in the figure. — (b) The convex program {T} recovers p if it is 
the unique minimizer of the trace-norm restricted to A. This is certainly the 
case if A is contained in a supporting hyperplane at p of the trace-norm ball 
B = {a \\u\\i < In other words, there must be a normal vector Y 

to a supporting hyperplane of B at p, such that Y is also normal to A. In 
the language of convex optimization, Y is referred to as a dual certificate. 



For generic deviations A, we expect that the A^j all have 
comparable magnitudes. Therefore, as long as r <C n, the 
second sum in (O will dominate the first one as required. 

The "only" difficulty faced in this paper consists in proving 
that ||p + A||i > ||p||i holds not just for generic matrices 
p + A in the aforementioned affine plane, but for all such 
elements simultaneously. Key to that will be a simple concept 
from convex optimization theory: a dual certificate [23], (9), 
El, O- By that we mean a matrix Y such that 



Ip + AHi > \\ P \\ 1 + (Y,A) 



(10) 



for A 7^ 0. If we can find such a Y which is also normal to 
the affine plane (c.f. Fig. 02b)), then the inner product above 
vanishes and ( fTOb implies ([8]). 

The main contribution of this work is an improved and 
generalized construction of an (approximate) dual certificate 
Y. 



Consider the sketch in Fig.QJa) (partly inspired by [4]). The 
matrix p is an element of an n 2 -dimensional linear space. The 
axis labeled £1 in the diagram represents the roughly 0(rn) 
coordinates we have information about, i.e. the space spanned 
by the {w a \ a £ SI}. As the n 2 — 0(rn) remaining coordinates 
(denoted by SI- 1 ) are unknown, there is a large affine space of 
matrices compatible with the available information. We have 
to specify an algorithm which picks one point from this high- 
dimensional affine space, and prove that our choice is identical 
to p with high probability. 

Since we are looking for a low-rank object, it would be 
natural to choose the lowest-rank matrix in the affine space 
of all matrices compatible with the information we have. 
However, minimizing the rank over an affine space is in 
general NP-hard [19|. To get around this problem, we employ 
the trace heuristic, which stipulates that minimizing the trace- 
norm is a good proxy for rank minimization (see e.g. lEOll . 
[21]). The resulting optimization problem (Q~|) is an efficiently 
solvable semi-definite program. 

The objective thus becomes proving that the trace-norm 
restricted to the affine plane has a strict and global minimum 
at p (Fig. Htb))- Thus, if p + A ^ p is any matrix in the affine 
plane, we need to show that 

IIp + A||i > HHIi- (8) 

A short handwaving argument indicates that adding a generic 
deviation A to a low-rank p is indeed likely to increase the 
trace-norm. 

To see why, recall that the trace-norm of a matrix is larger 
than the sum of the absolute values of the elements on the main 
diagonal [22 j. We will apply this estimate to p + A expressed 
in some eigenbasis of p. Let pi, . . . , p r be the eigenvalues of 
p. Then 

r n 

Hp + Aid > ^2\ Pi + A iii \+ J2 I a mI 

r n 

> ||p||i + 5^(sgar < )A il< + l A »l-(9) 

i—l i—r+1 



D. Novel approaches 

For readers well-accustomed to previous work, we shortly 
list some main technical differences. 

1) We employ an i.i.d. sampling process (sampling with 
replacement) to chose the revealed coefficients. This 
contrasts with the "Bernoulli" scheme used before J21, 
0. 

2) At two different points in the proof (Section III-CI 
Section lll-Fl ), we make use of a powerful large-deviation 
estimate for matrix-valued observables. This (so far 
under-appreciated?) operator Chernoff bound has been 
proven in ll24l . 

3) In the language of [2], when constructing a "dual 
certificate"-type matrix Y we note that it is sufficient 
to demand \\PtE — Y\i be small, as opposed to zero 
(Section III-Eb . The former is simpler to ascertain than 
the latter. 

4) We construct a particular matrix-valued random pro- 
cess (descriptively called the "golfing scheme"), which 
converges to the certificate Y exponentially fast (Sec- 
tion HUB- 

E. Previous versions of this result and some related work 

This work grew out of an effort to translate the results of HI, 
01 to the problem of quantum-state tomography, where bases 
of Fourier-type matrices naturally occur. The project turned 
out to lead to more general results than anticipated, producing 
the methods presented in this paper. 

We first published these results in ifTTl . a short paper written 
with a physics audience in mind. This pre-print contains all 
the main ideas of the current work, and a complete proof 
of Theorem [3] for Fourier-type bases (the case of interest in 
quantum tomography). We announced in [11] that a more 
detailed exposition of the new method, applying to the general 
low-rank matrix recovery problem with respect to arbitrary 
bases, was in preparation. 

Before this extended version of fiD had been completed, 
another pre-print ||25ll building on [ 1 1 1 appeared. The author 
of ||251 presents our methods in a language more suitable 



5 



for an audience from mathematics or information theory. He 
also presents another special case of the results announced in 
ifTTl : the reconstruction of low -rank matrices from randomly 
sampled matrix elements. The main proof techniques in l25l 
are identical to those of IfTTl . with two exceptions. First, 
the author independently found the same modification we are 
using here to extend the methods from Fourier-type matrices 
to bases with larger operator norm (his Lemma 3.6, our 
Lemma [Toll. Second, his proof works more directly with non- 
Hermitian matrices, and gives tighter bounds in the case of 
non-square matrices. 

A more detailed version of ITTl focusing on physics issues 
will appear elsewhere l26l . 

II. Main proof 

A. The ensemble 

Let A\, ... , A m be random variables taking values in 
[l,n 2 ]. Their distribution will be specified momentarily. Im- 
portant objects in our analysis are the matrix-valued random 
variables wa { ■ The sampling operator is 

9 m 

Y^-l ^ 

K:(T4 } w Ai { w Ai i c) ■ (11) 

to z — ' 

i=i 

Below, we will analyze the semi-definite program 

min HHIi (12) 
subject to TZa = TZp. 

If the Aj's correspond to m samples drawn from [l,n ] 
without replacement, the programs (Q} and ( fTSl i are equivalent. 
One can also consider the situation where the Ai's are i.i.d. 
random variables, describing sampling with replacement. Due 
to independence, the latter situation is much easier to analyze. 
Independence also implies the possibility of collision^ (i.e. 
Ai — Aj, for i j). In the presence of collisions, fewer 
than m distinct coefficients will contribute to dT2b . It is thus 
plausible (and will be confirmed below) that any upper bound 
on the probability of failure of the i.i.d. scheme is also valid 
for (UJ. From now on, we will therefore assume that the A^s 
are independent and uniformly distributed. 

To state the obvious: the solution a* to ([T2l is unique and 
equal to p if and only if any non-zero deviation A = a — p 
from p is either infeasible 

KA^O, (13) 

or causes the trace-norm to increase 

IIp + A||i > HpUl (14) 

The two conditions (fT3l , (TBI have a very different mathemat- 
ical flavor. Section III-CI concentrates on the first one, while 
the second one is more central in the remainder. 

Using ( TPTt . one can give a simple proof of our earlier 
remark that sampling with replacement can only decrease the 
probability of recovering p: 

4 By the "birthday paradox", such collisions are very likely to occur. 
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Fig. 2. The range of p determines an orthogonal decomposition of the space 
of matrices as sketched in the figure. The space T is the set of matrices 
a whose compression onto ker p vanishes (c.f. Eq. With respect to an 
eigenbasis of p, elements of T are supported on the handle-shaped region 
shown above. 

Proof: Let Pv,ith(m),Pwout (rn) be the probabilities that 
the solution of (fl2l > equals p, if the A\ , . . . , A m are sampled, 
respectively, with or without replacement. 

Let 1Z' be defined as in ([TT1 >. but with the sum extending 
only over distinct samples Ai ^ Aj (denote the number 
of distinct samples by to')- Then ker 7Z' = kerlZ, and 
consequently ( TT3l is true for 1Z iff it is true for 71'. 

Thus, the probability that the solution to ([T2l equals p is 
the same as the probability that the solution of 

min ||er||i subject to 1Z'a = lZ'p (15) 

equals p. But, conditioned on any value of m', the distribution 
of 7Z' is the same as the distribution of a sampling operator 
drawing to' basis elements without replacement. Hence 

Pwith(m) = E m /[p W out(m')] < pwout(m), 

since to' < to and clearly p WO ut(m') < p WO ut(m) ■ 
The i.i.d. scheme used in the present papers contrasts with 
the "Bernoulli model" employed in previous works ifTOll . 0, 
0. There, every number a g [l,ri 2 ] is included in 57 with 
probability m/n 2 . The slight advantage of our approach is 
that the random variables (wa, , p) are identically distributed, 
in addition to being independent. Also, the random process 
analyzed here never obtains knowledge of more than m coef- 
ficients, while this does happen in the Bernoulli model with 
finite probability. On the downside, the possibility of incurring 
collisions has some technical drawbacks, e.g. it means that 7Z 
will in general not be proportional to a projection. 

Note added: after the pre-print version of this paper had 
been submitted, V. Nesme and the author noted that existing 
arguments pertaining to sampling without replacing of real- 
valued random variables ll22l Chapter 12] remain valid in the 
non-commutative case ll27l . In particular, all large deviation 
bounds derived below under the assumption of independently 
chosen coefficients continue to hold for A^s sampled without 
replacement. While we will not make use of these observations 
in the present paper, we note that they can be used to slightly 
improve the bounds given below. Details are in 1 27 1 . 

B. Further layout of proof and notation 

Following ED, ||3], decompose A = At + Ay, with At S 
T, A^ € T 1 - (see Fig. EJ). (The reason for doing this will 
become clear momentarily). 
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The proof proceeds as follows 

1) In Section III-CI we show that A is infeasible (fulfills 
(TOT i) as soon as ||Ax||2 is "much larger" than ||A^||. 

2) The previous statement utilizes a large-deviation bound 
for operator- valued random variables, taken from lF24l . 
We repeat the proof of this powerful tool in Section lTl-DI 

3) We go on to show that 

||p + A||i > + (sgnp + sgnA^A) 

in Section III-EI Thus, as soon as the scalar product on 
the r.h.s. is positive, we conclude that A fulfills (fl~4-b . We 
then borrow a powerful idea from Q, 0, employing 
a "dual certificate". More precisely it is shown that 
the aforementioned scalar product is guaranteed to be 
positive, as long as there is a matrix Y £ range 7Z such 
that (0 VtY is close to sgnp, and (if) \\VtY\\ is small. 

4) Section HI-FI establishes the existence of a certificate Y 
in the case of bases with small operator norm. This 
is probably the most (comparatively) difficult part of 
the proof, and the one differing most from previous 
approaches. 

5) The construction of the previous section can be modified 
to work with any operator basis. Details are given in 
Section Section III-GI This completes the proof of the 
main result. 

6) In Sections IIII-AI IIII-BI we introduce some martingale 
techniques and put them to use to derive tighter bounds. 

7) Section UlI-DI deals with non-Hermitian matrices. 
Throughout, we will use the notation m = nm. The 

"oversampling factor" k describes the leverage we allow 
ourselves by going beyond the minimum number of parameters 
needed to describe p. 

We use round parentheses (a\,a 2 ) = tr o\a 2 for the 
Hilbert-Schmidt inner product, and angle brackets (ip, 0) for 
the standard inner product on (D n . 

Let Si be the singular values of a matrix a. The usual matrix 
norms are 

| a" 1 1 = maxSj, 

IMIa = (°,°) 1/2 = (E s ') 1 > 
Nil = tr|oi=^Si. 

i 

Both the identity matrix and the identity function on more 
general spaces are denoted by 1. 

We will frequently encounter inequalities between matrices, 
which are understood in the usual sense: <j\ < a 2 if an d on ly 
if <7i — <7 2 is positive semi-definite (a convention sometimes 
referred to as matrix order or Lowner partial order). 

As mentioned in the introduction (Section fl- Al l, sgnu is the 
matrix resulting from the application of the sign function to 
the eigenvalues of a. 

C. First case: large At 

In this section, we show that A is infeasible (with high 
probability) if At is much larger than A^. 



If ||ftA T ||a > W^rh, then 

||ftA|| a - \\KA T + KA T \\ 2 > \\KA T \\ 2 - \\TZA^\\ 2 > 

To find criteria for this situation to occur, we need to put a 
lower bound on ||7£Ax||2 and an upper bound on ||7?.A^||2. 
For the latter: 

WUA^Wl = {UA^,TZA^) < \\K\\ 2 ||A^||1. (16) 

It's easy to see that ||7?.|| equals n 2 /m times the highest 
number of collisions C := max^ \{j \Ai = Aj}\. This 
number, in turn, is certainly smaller than m (a truly risk-averse 
estimate). All in all: 

||ftA£||a <n 2 ||A^|| 2 . (17) 

Likewise, 

WRAtWI = (KAtiKAt) 

9 9 
71 71 

> —(A T ,nA T ) = —(A T: V T 71V T A T ) 
m m 

9 

71 

> —(1-\\Vt-V t KVt\\)\\A t \\1 (18) 
m 

This makes VtT^Pt an object of interest. Let Vai be the 
(matrix- valued) orthogonal projection onto WA t - Then the 
identity 

2 m 

E[ft] = — VE[? Ai ]=l, 

i=i 

follows directly from the fact that the matrices {w a } 
form an ortho-normal basis by definition. We conclude that 
Wj\PtVSPt\ — Vt- Thus, in order to evaluate ( TT8l . we need 
to bound the deviation of TtTIVt from its expectation value 
Vt in operator norm for small m. In (2J, this problem was 
treated using a bound known as "Rudelson selection principle" 
ll28l . We will derive a similar bound in the next section, as a 
corollary of the already mentioned large-deviation theorem for 
matrix-valued random variables from [24]. The result (proven 
in Section Hl-DI below) reads: 

Lemma 5. It holds that 

Pt[\\V t 'R.Vt - V T \\ >t]< 4nrexp > ( 19 > 

for all t < 2. 

We assume in the following that ( fT9] > holds with t = 1/2. 
Denote the probability of that event not occurring by p\. 
(Many statements in this proof will hold only up to a small 
probability of failure. We will defer an explicit calculation of 
these failure probabilities until the very end of the argument, 
when all parameters have been chosen). Then, using ( fTTI ). dT8b . 
we have that HA ^ if 

^IIAtHI > « 4 ||A^||2 HArlll > 2mn 2 ||A^|||. 
For the next sections, it is thus sufficient to treat the case of 
||A r || a < V2^n ||A^|| 2 < n 2 ||A^|| 2 . (20) 
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Remark: Repeating the calculations in this section without 
the trivial estimate C < m, th e last coefficient in ( f20l > can be 
improved from n 2 to J 2C2n , Since C is O(lnn) with very 



high probability, this would look like a major improvement. 
However, because only the logarithm of the coefficient enters 
our final estimate of the number of samples required, we will 
content ourselves with n 2 on the grounds that it is a simpler 
expression. 

D. Operator large deviation bounds 

The material in the first paragraph below is taken from 
|24|. We repeat the argument to make the presentation self- 
contained. It is an elementary - yet very powerful - large 
deviation bound for matrix-valued random variables. The basic 
recipe is this: take a textbook proof of Bernstein's inequality 
and substitute all inequalities between real numbers by matrix 
inequalities (in the sense of matrix order, see Sec. IH-Bl i. 

We start by giving a basic Markov-inequality. Let 8 be the 
"operator step function" defined by 



6(a) 



cr < 1 

1 a <t\. 



If a is positive semi-definite, the trivial estimate 0(c) < trcr 
holds. Thus, for any number A > and matrix-valued random 
variable S: 



Pv[S £ tl] 



Pr[S - tl % Q] = Pr [ e xs - xtl £ l] 
E[6(e AS - At1 )] <E[tr e AS - A "] " 



E[tre 



A,S'i 



(21) 



Now let X be an operator-valued random variable, Xi be i.i.d. 
copies of X, and S = Y^T X- Then 



E 



< E 



tr exp ( A > Xj 



trexp ( A > X 



i 

m—1 

E 



tr E[exp A 



m—1 



exp(AAf m ) 
X t )]E[exp(AX)] 



< E 



trexp I A 2_, 



|E[exp(AX)]| 



< 
< 



• • • < E[trexp(AXi)] ||E[exp(AX)] 
n||E[e AX ]H m , 



(22) 
(23) 



where the second line is the Golden-Thompson inequality |29|. 

Reference [24] now goes on to derive a Chernoff-Hoefding- 
type inequality for bounded Xi £ [0, 1]. We find it slightly 
more convenient to work with a Bernstein-type estimate, 
bounding Eq. d23l by the second moments of the JQ. (The 
derivation in the next paragraphs is influenced by the proofs 
of the commutative version in [30], ll3"Tl ). 

Indeed, assume that E[V] = and ||Y|| < 1 for some 
random variable Y. Recall the standard estimate 

1 + V < e y < 1 + y + y 2 



valid for real numbers y E [—1,1] (and, strictly speaking, a bit 
beyond). From the upper bound, we get e Y < 1 + Y + Y 2 , as 
both sides of the inequality are simultaneously diagonalizable. 
Taking expectations and employing the lower bound: 



E[e y ] < 1 + E[Y 2 ] < cxp(E[r 2 ]), 



(24) 



and thus ||E[e y ]|| < || exp(E[r 2 ])|| = exp(||E[y 2 ]||). 

These are all essential ingredients for the following theorem, 
summarizing the results from this section. 

Theorem 6 (Operator-Bernstein inequality). Let Xi, i = 
l,...,m be Ltd., zero-mean, Hermitian matrix-valued ran- 
dom variables. Assume Vo,c G 1R are such that ||E[X 2 ]|| < 
Vq and \\Xi\\ < c. Set S — and let V — wiVq (an 

upper bound to the variance of S). Then 



Pr [II S1| > t] < 2nexp 



(25) 



(26) 



for t < 2V/c, and 

Pr[||S||>t] <2nexp^-^J , 
for larger values of t. 

will be used only once, in 



The second equation 
Section ITTTBl 

Proof: Combine Eqs. (fJTJ [23] l24b to get the estimate 

Pr[5 ^ tl] < n exp (-Xt + A 2 my o 2 ) . 
Let s — t/V be the deviation in units of V. Then 

Pr [5 £ sVl] < n exp (-XsV + X 2 V 2 ) . 
Choose A = s/(2V). The exponent becomes 

-s 2 /2 + s 2 /4 = -s 2 /4 
valid as long as A||X|| < 1, which is certainly fulfilled if 

2V 



s < 



(27) 



If d27l ) does not hold, set A = 1/c and compute for the 
exponent 

-sV/c + V 2 /c 2 = -sV/(2c) - (sV/(2c) - V 2 /c 2 ) 
< -sV/(2c) = -t/(2c). 

The same estimates hold for — S, giving the advertised 
bound with the factor of 2 coming from the union bound 
(which is also known as Boole's inequality: the probability 
of at least one of a set of events occurring is not larger than 
the sum of their individual probabilities). ■ 

Note that for n = 1, we recover the standard Bernstein 
inequality, which we will also have the occasion to use. 

We are in a position to supply the deferred proof of 
Lemma [5] Recall that it was claimed that 

Pr [I | PtWt - 'PtW > t] < 4nrexp 

\ 8^ 

for all t < 2. 
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Proof (of Lemma \5): For a £ [l,n 2 ], let V a be the 
orthogonal projection onto w a . We define a family of linear 
operators Z a by 



Z„ 



V T V a V T - 



Then 



Since E[Za 4 ] = —Vt, the operator whose norm we want to 
bound can be written as 

m 

VtTIVt -Vt = z^{Za, - E[Z A J). 

i=l 

We will thus apply the Operator Bernstein inequality to the 
random variables X Ai '■= Z Ai — ~E\Z Ai ]. To this end, we 
need to estimate the constants Vq,c appearing in Theorem [6] 
Compute: 

n 2 

nz%] = —M[(w Ai ,r T w Ai )Z Ai \. 
m 

From Eq. (O we get (w Ai , VTW Ai ) < and thus 

EZi < EZ 4 =— P T , 

m n m -2 

having used that > (matrix order). Hence 



|E[Xij|| = ||E[ZiJ-E[Z A J 2 | 



2?wr — 1 , , . , 2nvr 

< o—\\Vt\ < —^r 



2v 
ran 



V 2 



Next: 



< 



m 
1 

in 

,2 



n 2 V T V At V T - v T 



n 2 V T V A {P T 



n r 
< — 2v- 

m n 



^\\v TWAi \\l 

2vnr 2v 



m 



K 



so that 



2mV 2 



2mvnr k 2nnr 
> = = = 2. 



The claim follows from Theorem [6] 

E. Second case: small At 

In this section, we will show that 

||A T || 2 < n 2 ||A£|| 2 , 
A £ range TZ^ 



(28) 
(29) 



together imply ||p+A||i > ||p||i, if we can find a "certificate" 
Y £ range TZ with certain properties. The basic line of 
argument is similar to the one given in Section 3 of 0. 

Set U = range p and let Pjj be the orthogonal projection 
onto U. We will make repeated use of the basic identity 

||cr||i = tr \a\ = tr((sgn a) a) = (sgaa,a) 



(recall the definition of sgn from Section H-Al i. We then find 

l|p + A||x 

> \\P u ( P + A)P u \\ 1 + \\P^(p + A)P l j\\ 1 (30) 
= ||p + iVAJV||i + ||A£||i 

> (sgn j0 , j0 + P l/ AP t/ ) + (sgnA^,A^) (31) 
= ||p||i + (sgn j0 ,P l/ AP t/ ) + (sgnA^,A^) 

= ||p||i+(sgnp + sgnA^,A). (32) 

The estimate d30l ) is sometimes known as the "pinching 
inequality" ( 11121 . Problem II. 5.4), and in line OTb we used 
Holder's inequality: (0-1,02) < lki|| 1 1 era 1 1 1 - 

To conclude that ||p + A||i > ||p||i, it is hence sufficient to 
show that (sgnp+sgn Ay, A) > 0. Choose any Y £ range TZ. 
Using (|29j: 

(sgn p + sgn A^, A) = ( sgn p + sgn A£ - Y, A) . (33) 
Assume that Y fulfills 



1 



\T T Y-sgnp\\ 2 < — , 



Then d33l becomes 

(sgnp + sgnA^ - Y, A) 
= (sgnp-y,A T ) + (sgn A^ -Y,A%) 



(34) 



> 



> 



A x 



Till 



^||AT|| 2 >i||A T || 2 



1 

2U 2 



IA 



T 2 



We summarize. Assume there is a certificate Y £ range TZ 
fulfilling (134b . Let a* be the solution of the optimization 
problem, let A* = p—cr*. Then A* must fulfill (O, for else it 
would be unfeasible. It must also fulfill d28l i. by Section IH-CI 
But then, from the previous calculation (A*)y must be zero, 
as otherwise ||o*||i > ||p||i- This implies that (A*)t is also 
zero, again using ( f28l ). So A* is zero, and therefore o* = p 
is the unique solution to ( fT2l . 

It remains to prove the existence of the certificate Y. 

F. The certificate: bases of Fourier type 

In this section, we construct a Y £ ranged with 



\V T Y - sgnp|| 2 < ■z-z, 

1 A 



1 



\V*Y\\ < - 



(35) 



1 

2n~ 2 ' "-^-"-2 

assuming that max a ||w a || 2 < — . A modified proof valid in 
the general case will be given in Section III-GI In previous 
approaches to matrix completion, this step was the most 
involved, covering dozens of pages. We present a strongly 
simplified proof using two key ideas: a further application 
of the operator Bernstein inequality; and a certain, recursive 
random process which quickly converges to the sought-for Y. 

1 ) Intuition: A first, natural ansatz for finding Y could be 
as follows. Define 



X„ 



w a (w a , sgn p), 



Y 



5>* 



(36) 



It is obvious that Y is in the range of TZ and that its expectation 
value (equal to sgnp) fulfills the conditions in (f35T >. What is 
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more, the operator Chernoff bound can be used to control the 
deviation of Y from that expected value - so there is hope 
that we have found a solution. However, a short calculation 
shows that convergence is (barely) too slow for our purposes. 

Intuitively, it is easy to see what is "wrong" with the 
previous random process. Assume we sample k < m basis 
elements. Employing (1361 , our general "best guess" at this 
point for a matrix Y\ which resembles sgnp on T (i.e. with 
HT-V^i — sgn p| 1 2 "small") would be 

2 k 

Y i = -r z2 WAi ( WAi > sgn d ■ 




Fig. 3. Caricature of the "golfing scheme" used to construct the certificate. 
In the ith step, Xi—\ designates the vector we aim to represent. The 
approximation of -Xj_i actually obtained is VT^i^i-l- The distance of 
the new goal Xi = — VTTZiXi—i to the origin is guaranteed to be 

only half the previous one. The sequence X{ thus converges exponentially 
fast to the origin. 



Now given this information, the matrix we really should be 
approximating in the next steps is £V(sgn p—Y\). The process 
(l36i l. in contrast, does not update its "future strategy based on 
past results". Trying to perform better, we will draw a further 
batch of k coefficients and set 



2 k 



*2 = Yi + — ^2 w Ai ( w Ai j sgn p — ~PtY\). 



i=k+l 



The sequence VtY{ will be shown to converge exponentially 
fast to sgn p. For reasons which should be all too obvious from 
Fig. [3] we will call this adapted strategy the golfing scheme. 

On the one hand, the size k of the batches will have 
to be chosen large enough to allow for the application of 
the operator large-deviation bounds tailored for independent 
random variables. On the other hand, k must not be too large, 
as the speed of convergence is exponential in / = m/k. 

2) Proof: Before supplying the details of this scheme, we 
state a lemma which will allow us to control the operator 
norm HT-j^H of the approximations. The operator-Bernstein 
inequality makes this, once again, a simple calculation. 



Lemma 7. Let F G T. Then 



Pr 



\Tt11F\\ > t 



< 2n exp 



for t < y/2/r\\F\\ 2 , and 



Pr 



T^KF > t 



< 2n exp I — 



t nr 



t\frK 



2V2v\\F\\ 2 , 
for larger values of t. 

Proof: It suffices to treat the case where \\F\\2 = 1. Set 



X„ 



-V^w a (w a ,F). 



Then YT X Ai = V^KF, and 

mx At ] = —v^f = o. 

m 

Using (0 and the fact that HPyWaH < \\w a \\ we estimate the 



variance: 



< 



< ^||F" 2 
m 2 2n 



£K,F) 2 ||(^ Q ) 2 || (37) 
— — V 2 



2 — 2 



Next, 



so that 



.. n 2 I v 2vr nvy/2r \/2v 
\X-AiW < — \ 



m v n n m 



2mV 2 /\\X At \\ > 



2mv y/rn \[2 



Now use Theorem [6] ■ 
We sample I batches of basis elements, the ith set consisting 

of TOj = Kirn matrices. 
For 1 < i < I, let 



^2 » n iH hmi 

TZi : a h> — WAi {w Aj , a) 

rrii ^— ' 

3=miH hmi-i + 1 



be the sampling operator associated with the zth batch and set 

i 

X = sgn p, Yj = HjXj-i, Xi = sgn p - V T Yi 

3=1 

(see Fig. [5J. From this, we get 

Xi - (38) 
(1 - Vt^Vt)^ - Vt^-iVt) ... (1 - V t TLxVt)X q . 

Assume that in the ith run 



|(l-PT^T)^-l|| 2 <C i ||Xi_ 1 | 



(39) 



Denote the probability of this event not occurring by P2(i) 
(recall that pi has been defined in Section III-Cb . Clearly, if 
does hold for all i, then 



IX: 



(1 - -PrKiP T )Xi-ih < Ci||X-i|| 2 , 



that ||Xi|| 2 < Vrni =1 c,-. 



Assume further that for all i the estimate 



\V T lZiXi_i\\ < ti\\Xi_i\\2 



is true, with p-$(i) bounding the probability of failure. 
Then 

i i 

\\T>tYi\\ < 1^11^3^3-1 1| < ^^||X-l|| 2 . 

i=l t=l 
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A first simple choice of parameters (to be refined in Sec- 
tion |IIEB]i is 

<k = 1/2, 

U = l/(4Vr), 

k 1 = 64^(ln(4w) + ln(2i) + /3 Inn) 
for some /3 > 0. It follows that 

PQ||2<^2-*, ||^||<i^2-(- 1 )<i. 

i=i 

With / = [log 2 (2n 2 y / r)] , the conditions in Equation d35l > are 
met. Using Lemma [5] and Lemma [7] the failure probabilities 
become 

Pl < 4nrexp(-^-), 

p 2 (i) < 4wexp(-^-), 

P3(») < 2nex P (-^-) 

all of which are bounded above by ^jfi - ' 3 . Theorem [3] for 
Fourier-type bases thus follows from a simple application of 
the union bound. The number of coefficients sampled must 
exceed 

m 

= iKi = 64i/(ln(4nr) + ln(2Z) + (3lnn) log 2 (2n 2 y/r)rn 
= 0(rnv{l+ n). 

3) Discussion: The "golfing scheme" above could be de- 
scribed as a "sequential" way of building the certificate vector: 
every time we sample a basis element w a , we assign a coeffi- 
cient c a = (w a , Xi) to it, but never alter our previous choices. 
This contrasts with the more "holistic" method employed 
in 0, O, where Y was constructed by directly inverting 
VtKVt: 

Y = KP T [PtTIPt ) " 1 sgn p. (40) 

Presumably, the most optimal sequential scheme is the one 
which chooses the coefficient c a in every step such as to 
minimize the distance to the vector we aim to approach. If the 
distance is measured in 2-norm, it is simple to write down a 
closed-form expression for that choice. However, such a strat- 
egy introduces strong dependencies into the random process, 
which make an analysis challenging. The elementary i.i.d. 
tools employed in this paper are no longer applicable. This 
intuition motivates considering martingale generalizations of 
the operator-large deviation bounds of l24l . We will indeed 
prove a deviation estimate for matrix-valued martingales in 
Section IIII-AI Whether this bound is sufficient to analyze the 
"optimal sequential scheme" remains unclear. 

Another observation is that, since Lemma [5] provides a 
uniform bound on \\(VtT^Pt — 1)^1! 2, there is no need for 
the iterative scheme to chose a different set of basis elements 
in each run, in order to achieve exponential convergence of 
VrYi — > sgn p. Iterating over a single fixed set of 0(nr\nn) 
basis elements would equally do the job. Unfortunately, the 
statement of Lemma UJ is not uniform in F € T, necessitating 
the less-optimal approach used above in order to control 



jPyliH. However, a smart substitute for the crude union 
bound could potentially remedy this situation. 

By the same token, one can replace Lemma [5] by a non- 
uniform estimate. The golfing scheme only requires that 
\\(Vt'Rh'Pt ~ 1)-X"i||2 be small, which is much easier to 
guarantee than a similar bound on \\PtT^{Pt — Vt\\- This 
is precisely the role of Theorem Q~2] below, on which bounds 
of order 0{rnvhin) can be based (see Section Ull-Bt . 

We remark that 0, analyzed d40b by expanding the 
inverse into a Neumann series 

oo 

(VtUVtY 1 = £(1 - VtKVtY- (41) 

There is a formal analogy between this series and our con- 
struction, in particular in the light of d38l >. Note however, that 
the product in d38l > involves distinct and independently drawn 
sampling operators TZi in every factor. Informally speaking, 
this added degree of independence seems to make ( [38] ) a more 
benign object than the powers (1 — VtTZPtY 1 in ffij. 

G. The certificate: general case 

In this section, we show that the construction of Y described 
above continues to work if the assumption ([3]l on the operator 
norm of the basis elements is replaced by the incoherence 
properties (0] 

Indeed, in the discussion of the golfing scheme, we referred 
to the operator norm of w a exactly once. In the proof of 
Lemma [7] we considered the quantity 

2 

X a =—P£w a (w a ,F). (42) 
m 

After Equation (fSTJ, the variance 

l|E[xij||<4EK' F ) 2 IK^) 2 H 

m z * — ' 

a 

was upper-bounded using the fact that \\(V^ w a ) 2 \\ < —. 
Clearly the absence of this assumption can be compensated for 
by a suitable bound on (w a ,F) 2 . This will be made precise 
below. 

Assume that F is some matrix in T with \\F\\2 — 1. Further, 
assume that at least one of the following two bounds 

max||u; Q || 2 < -, (43) 
a n 

max \(w a ,F)\ 2 < (44) 

holds. 
Note that 

||ELY 2 J|| < 4 max EK F ) 2 -^> w ^M 45 > 

a 

where the maximum is over all normalized vectors rp g 
(range p)- 1 . Let ipQ be a vector achieving the maximum. 

2 

Define two vectors p, q in R™ by setting their components 

10 2 1 

q a -=(w a ,F) , p a := -(tp ,wlipo) (46) 

respectively. The assumption that H-FlU = 1 implies that 
||g||i = Yl a k«l = Slightly less obvious is the fact that 
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the same is true for the other vector: ||p||i = 1, regardless 
of the basis chosen. This relation is ascertained by the next 
lemma. 

Lemma 8. Let {w a }, be a set of n x n-matrices (not 
necessarily Hennitian) that fulfill the completeness relation 



Then 



a 

^wlwa = nl. 



(47) 



Proof: Compute: 



Thus, 



IIpIIi = = -(ipo,ntil) ) = 1. 



We return to the vectors in d46b . The assumptions made 
imply that at least one of the vectors is element-wise bounded 
above by Thus 



p a q a 



< min{||p| 



1II9II 



IIPII 



Iklli} < -o- (48) 



Plugging this estimate into the computation of the variance 
d45l > we obtain 



\\B[X_ 



< 



m,Kr 



We have proved the general analogue of Lemma [7] 

Lemma 9. Let F G T. Let f > \\F\\2 be an upper bound on 
the 2-norm of F. Assume that one of the two bounds 

v 



max w a 



< -. 



< 



n 
n 



holds. Then 



Pr 



\V±KF\\ > t 



< 2n exp 



t 2 nr 



(49) 
(50) 

(51) 



for t < yfflrf. 

Next, we have to justify the bounds on (w a , F) 2 we imposed 
in the previous lemma. By assumption (0, the estimate does 
hold for F — sgn p, i.e. Lemma [9] may be applied during 
the first leg Xq = sgnp of the "golfing scheme". However, 
there is no a priori reason that the same be true for X% = 
(1 — VtT^\Pt)Xq. For now, all we know about X\ is that 
it is an element of T and hence low-rank. This property was 
enough for Fourier-type bases, but in the general case, it proves 
too weak. We thus have to ensure that "inhomogeneity" of Xi 
implies inhomogeneity of Xi+i, a fact that can be ascertained 
using yet another Chernoff bound. 

Let p(F) = max (w a , F) 2 be the maximal squared overlap 
between F and any element of the operator basis. 



Lemma 10. Let F G T. Then 

Pr [n((l - V t TIVt)F) >t]< 2n 2 exp 

for all t < u(F). 

Proof: Fix b G [1 , n 2 ] . Define 

X, 
Then 

m 

x Ai = (w b , (i - v T nv T )F) 



tn 



4u(F)v J ' 



1 n 2 

— (w b ,F)-(w b ,—V T Wa)(w a ,F). (52) 
m m 



Note that the first term in (1521 is the expectation value of the 
second one. Therefore, E[XaJ = and the variance of X Ai 
is bounded above by the variance of the second term alone (as 
in the proof of Lemma 0: 

1 n 2 

< -^VK,-?TWa) 2 K,f) 2 
7i z — ' m 



< 



^{F)\\V T w b \\l 
n 2 p{F)vr p{F)v 



=:V 2 



Further, 

\X At | < -liiF) 1 ' 2 (l + n 2 -) = -M^) 1/2 (l + nvr). 
m \ n J m 



m \ n 

Thus, from the Chernoff bound 



< 



Pr 

2 exp 



\(w b ,(i~v T nv T )F)\ > Vt 



Ap(F)v 
as long as \ft does not exceed 



2mV 2 /\X Ai \ = 



2mu(F)v 



>KF) 



1/2 



ran p(F) 1 / 2 (l + nvr) 

The advertised estimate follows by taking squares and 
applying the union bound over the n 2 elements of the basis. 

■ 

With these preparations made, we can repeat the "golfing" 
argument from the last section. As an additional constraint, 
we demand that 

u{X t ) < c?/i(Xi_i) 

be fulfilled for all i, with probability of failure given by pi(i). 
Then, with 

d = 1/2, 

U = l/(2Vr), 

Ki = 64i/(ln(4n 2 ) + ln(3/) + ,9 Inn) 
it follows that 

||*<||a < 2-*||sgnp|| a 



l*(Xi) < 2-^V(sgnp) < — (2- 2 V) 
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Thus, in ith iteration of the golfing scheme, we can apply 
Lemma |9] with F = X 4 and / = 2"^. 

The failure probabilities pi,P2(i) and P3(i) are as before. 
Further 

p 4 (i) < 2n 2 exp(-^-), 

which, as the other probabilities, is bounded above by ^n^ 13 . 
By the union bound, Theorem [3] holds as long as 

m > log 2 (2n 2 v / r)64i/(ln(4n 2 ) + ln(3Z) + /3 In n)m. 



III. Refined methods and generalizations 
A. Martingale methods for matrix-valued random variables 

The purpose of this section is two-fold. First, we derive 
a dimension-free bound for the norm of the sum of vector- 
valued random variables (Theorem [12). Substituting Lemma[5] 
by this dimension-free analogue will enable us to give tighter 
bounds of matrix recovery in Section IIII-BI (see discussion 
in Section |II-F3t . Such dimension-free bounds for sums of 
vectors are well-known [32] and we could in principle content 
ourselves with citing an existing version. Making the proof 
explicit, however, ensures that this document remains self- 
contained and allows us to record a corollary which may 
be of independent interest. Indeed, the simplest argument in 
11321 relies on a standard large-deviation bound for real-valued 
martingales. We use the occasion to prove an operator version 
(Theorem [TT) of this martingale estimate, which generalizes 
the operator Chernoff bound. This constitutes the second 
purpose of the present section. 

Let X\, . . . , X m be a sequence of random variables. We will 
use the bold-face symbol X,; to refer to the set {Xi, . . . , Xi} 
of the first i of these variables. Theorem [TT] is an almost 
verbatim translation of the real-valued statement in |30| (see 
also [33 1). To lift it to operator-valued variables, we use exactly 
the same tricks that were employed in [24] to obtain the 
operator Chernoff bound (c.f. our exposition in Section Hl-Dl i. 

Theorem 11 (Variance bound for matrix-valued martingales). 
Let Xo, . . . , X m be arbitrary random variables. Let Zq = 
and let Z\, . . . , Z m be a sequence of(nxn)-Hermitian matrix- 
valued random variables. Assume the martingale condition 

E[Zi | Xi_i] = Zi_ a 

holds for i = 1, . . . , m. Assume further that the martingale 
difference sequence Di = Zi — Zi-\ respects 

||A||<Ci, HEpflXi-xlll <al 

Then, with V = YT °i> 

t 2 



Pr \\\Z m \\ >t]< 2nexp 



(53) 



for any t < 2y/(maxj Cj). 

Proof: As in Section Ill-Dl 

Pr[5 £t]< e" A E[tre AZ "] 



for any A > 0. Using Golden-Thompson: 

E[tre AZ ™] = E[E[tre A(z — 1+I? '" ) |X m _ 1 ]] 
E[E[tre AZ — 'e^lX^]] 



< 



Eftre^-^Efe 
Eftre^-MlEfe 



< 



XD 



,X m _i 



m— II 



From the martingale condition: 

E[AA|Xi_i] = AE[Z, - Zi-xIXi-i] = 0. 

Once more, we will make use of the estimate 1 + y < e v < 
1 + V + y 2 valid for \y\ < 1: 

E[e AD *|X i _ 1 ] < l + EfAAlXi-il+E^^lXi-i] 
= l + A 2 E[Z? 2 |X,_i] 
< exp(A 2 E[Z? 2 |X,_ 1 ]), 

as long as A||£>i|| < 1. Thus 

||]E[e AjDi |Xi_i]|| < ||cxp(A 2 E[^ 2 |X l _ 1 ])|| =e AV . 2 . 

By induction 

E[tre AZ "] < E[tr e AZl ]e A2 ^+-+ CT ") < ne* v . 

The claim follows by setting A = t/2V. ■ 
The next theorem is essentially contained in Chapter 6 of 

ll32l (see also [34]). To keep the presentation self-contained, 

we give a short proof in Appendix [VI] 

Theorem 12 (Vector Bernstein inequality). Let X\ , . . . , X m 

be independent zero-mean vector-valued random variables. 
Let 



N 



5> 



Then 



Pr 



N>VV + t 



< exp 



where V = ^iHWll] and 1 ^ V/(m&x \\X l \\ 2 ). 

We can now prove a non-uniform, but dimension indepen- 
dent version of Lemma [5] 

Lemma 13. Let F e T. Then 
Pv[\\(V T n-l)F\\ 2 > t\\F\\ 2 ] < exp 



(t - y/2v/n) 2 K 



8// 



provided t < 2/3. 
Proof: Let 



Xi 



n 1 

— P T W Ai (WA t :F) F. 

m m 



Then 



E[||X,|| 2 ] < —n(wA,F) 2 \\V T w A \\ 2 2 } 



< 



m 

^2^\\F\\ 2 = ^\\F\\ 2 =:V 2 . 



Next, 



\X 



i II 2 



< 



\F\\ 



,2vr 



1 < 



M\F\ 



13 



So that 



2V/\\X t \\ 2 > 2/3\\F\\ 2 . 



Now use Theorem [T2l ■ 
Note added: After the pre-print version of this paper was 
published, the author was made aware of a related matrix- 
valued martingale bound in [35 1. The derivations used in ll35l 
are very similar in spirit to ours (however, their results cannot 
be applied directly to the problem treated here, because no 
variance information is incorporated). A few months after our 
pre-print appeared, more sophisticated matrix-valued martin- 
gale bounds were established in |36|. 

B. Tighter bounds for Fourier-type bases 

We present a refined analysis of the "golfing scheme", which 
achieves fairly tight bounds for Fourier-type bases. Compared 
to Section Hi-Fi there are two changes in the argument. First, 
we use the dimension-free large deviation bound for vectors 
derived in the previous section. Second, the parameters of the 
random process used to construct the certificate are chosen 
more carefully. 

Let a > 4 be a number to be chosen later. We will analyze 
the following set of parameters for the golfing scheme: 

Ki = 18(lna + /3)^c~ 2 , 
1 

ci = c 2 



1 

Cl = 2 

h = to = 



u = 



In ? 



21n 1 /V 

(2 < i < 0, 
1 

(2 < i < I), 



4V? 

I = riog 2 (2n 2 VF)l. 
Using the arguments from Section Hl-FI 

iix i n 2 <^n Ci =vF2- i { i ^_T n 

3=1 1 



i= 1,2 
i > 2. 



Thus 



< 



12 



I h 1 1 

1 

< 2' 



1 In n 1 In n 
4 In n 8 In n 



and 



IX, 



£112 



\\V T Yi-sgap\\ 2 < 



1 



as required by (|33T >. 

We look at the failure probabilities. To bound p 2 (i), 
we make use of the dimension-free estimate provided by 
Lemma Qj] 



p 2 (i) < exp - 



(§ Cl ) 2 9(lna + /3)cT 



1 



The failure probabilities concerning the assertions about 
HP^yll are bounded, as before, by Lemma [7] Note that 
we need to employ the "Poissonian" part of the lemma, i.e. 
Eq. (|26| | when i > 2. 



P3(1),P3(2) < exp 



18(ln(a) +^)lnn 
16 



ln(2n; 



< -e" 
a 



, lnn9(ln(a)+/3) 
p 3 (i) < exp ( + ln(2n) 



Lastly, pi can be comfortably bounded by 

Pi < exp -— — + ln(4nr) 
V 62v 



< 



< 



-2(ln a+p) (2 In n+(l- 1))+2 In ri+ln 4 



-/3 



A first improved estimate may be achieved at this point by 
setting a = 21. From a simple application of the union bound 
we infer that the total probability of error is smaller than e - ^. 
In total, the process will have accessed fewer than 

18(ln(2Z) + /3)j/4(21nn + log 2 (2n 2 Vr) - 2)nr 
= 0(nrv{f3 + In Inn) Inn) (54) 

expansion coefficients. 

Theorem 14. Let p be a rank-r matrix and suppose that {w a } 
is an operator basis fulfilling max a ||if a || 2 < — . Then the 
solution a* to the optimization problem (fJJ) is unique and equal 
to p with probability of failure smaller than e~^, provided that 

|0| > 0(nrv(l3 + In Inn) Inn). 

Largely for aesthetic reasons, we provide a further refine- 
ment which does away with the (lnlnn)-term in ( l54l . Recall 
its origin. Let p(i) < pi(i) + p 2 {i) be the probability that at 
least one of the two assumptions 



\{VtK{P t 



i)Xi_i|| 2 < a 



\Xi 



■12, 



-12 



(55) 
(56) 



made about the ith batch does not hold. In the argument above, 
we employed the union bound which ascertains that the total 
probability of failure is bounded above by lm&Xip(i). To 
make this expression a constant, maxip(i) must be 0(Z _1 ). 
This, in turn, was achieved by setting a = O(lnZ) = 
0(ln Inn). 

There is an alternative construction for the dual certificate 
which turns out to yield a better estimate. Informally, the idea 
is to draw I' > I batches, but to include into the golfing scheme 
only those batches for which the assumptions d55l l. (TS6b hold. 
We must choose I' large enough that, with high probability, 
I of the batches do fulfill the assumptions. There is hence a 
further degree of freedom in the choice of the parameters: 
decreasing the Ki increases the average number of batches not 
meeting the assumptions, which can be compensated for by 
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increasing I', It will be shown below that this freedom may 
be used to improve the bounds. 

To give a formal description of the construction, we re-state 
the slightly modified definitions of the objects occurring in the 
golfing scheme. The most important change is the introduction 
of a function / : [1, 1] — > [1, 1'] which enumerates the batches 
to be included. More precisely, the objects 



n~ 



m i H ^ra 



TZi : a i y — 

rtii ^— ' 

j=miH hm.-i+l 



X = sgnp, 
are defined as before, while 



Xi = sgn p-V T Y l 



coefficients. In the second case (fi > 8 + 3 In 6) the number is 

3 

18(ln 5 + /3)z/4(2 In n + -0 log 2 (2n 2 V^nr 

= 0(nrv{fi + l) 2 lnn) 

Theorem [4] follows. 

Remark: All the arguments of this section remain valid 
when the bound on the operator norm of the basis is dropped. 
The sole obstruction preventing us from stating 0(rnv Inn) 
bounds for the more general case is the union bound in 
Lemma [10] While it seems plausible that one can overcome 
this difficulty with reasonable effort, the author has so far 
failed to do so. 



"/(i)^J-i 



now only depends on a subset of batches. The function /, in 
turn, is defined by setting f(0) = and f{i) to 

min j > f(i — 1) 
such that \\(V T n 3 V T ~ l)Jfi_i|| a < c t \\X^i\\ 2l 

\\v£KjXi-i\\ < u\\Xi-ib. 

It remains to choose the parameters of the golfing scheme. 
With foresight, set a = 6. Then the probability p(i) of the ith 
batch (i > 2) being discarded (i.e. i not being in the range of 
/) is smaller than 

p(i)<Pi(i)+P2(i) < ^e~ p . 
By the standard Chernoff-Hoefding bouncQ: 
p$ := Pr [(number of batches in the range of / ) < /] 



< exp 



-2(§Z'-Z) 2 



We consider this bound in two regimes. First assume that n > 
2 5(£+in6) so that l > 2i og2 n > 9(/3 + ln6). Choose V = 21. 

The exponent becomes — | < — ((3 + In 6). Next, drop the 
assumption on n and instead demand f3 > 8 + 3 In 6. Set 
I' = In this case a few simple manipulations yield for 
the exponent 



4{/3-l) 2 l 
3^ 



< -(/3 + ln6). 



In either case: 



P5 < -e~ p . 
a 

By the union bound, the total probability of failure is smaller 
than 

6 



-P - 



P<Pi +p 2 (l) +P2(2) +p 3 (l) +p 3 (2) +p 5 < -e 

a 

Under the first assumption (n > 2 5 ^ +ln6 ^) the scheme 
required knowledge of fewer then 

18(ln 5 + P)v4(2 In n + 2 log 2 (2n 2 y^))nr 
= 0(nrv(/3 + 1) Inn) 

5 E.g. Theorem 2.3a in 1331 ; one could also use the Bernstein inequality 
derived in this paper, obtaining slightly worse constants. 



C. A lower bound 

Reference J3) gave lower bounds of order Oinrv Inn) for 
the number |f2| of matrix elements necessary to fix a rank-r 
matrix. Since the theory of low-rank matrix recovery seems 
better-behaved for Fourier-type bases, it might be conjectured 
that fewer coefficients are sufficient in this case. This hope 
turns out not to be realized. 

The results of this section imply that the bound of Theo- 
rem 2] is tight up to multiplicative constants. 

Theorem 15. Let n — 2 k be a power of two. Let {wi^} be 
the Pauli basis defined in Section \I-B2\ 

1) Let f2 be any subset of [1, n 2 ]. If \Q\ < (n — 2) log 2 n, 
then there are two rank-one projections Pi , P 2 with 
orthogonal range such that (w a , Pi) — (w a , Pi) for all 
aeVl. 

2) There is a rank-one projection Pi with the following 
property. Let Q be a set of numbers in [1, n 2 ], obtained 
by sampling 



m < 



1 



-nlog 2 n 



+ 

times with replacement. Then with probability 



Pf > I 1 — n 2 1n2(l + ./3) 



there exists a rank-one projection P%, orthogonal to Pi, 
such that (w a , Pi) = (w a , Pi) for all a£!l. 

The proof makes use of the theory of stabilizer states, 
a common notion in quantum information theory lfT31 . To 
make the presentation self-contained, we have included the 
briefest outline of this theory as Appendix IVIII The proof 
below assumes familiarity with the notions introduced in the 
appendix. 

Proof: In the statement of the theorem, we used a "one- 
dimensional" labeling of the Pauli basis elements w a by 
numbers a € [1,« 2 ]- In Section fVIII on stabilizer theory, 
a "two-dimensional" labeling in terms of pairs (p, q) from 
[l,n] x [l,n] proved more convenient. We assume that some 
mapping identifying the one set with the other has been chosen 
and will subsequently not distinguish between them. 
For the first statement: 

By Prop. |23j there are n stabilizer groups G x , x e F 2 k 
whose pairwise intersections equal {1}. If |f2| is smaller 
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than (n — 2) log 2 n, then at least one of these stabilizer 
groups intersects {w a | a £ ft} in I < log 2 n — k elements. 
Call that stabilizer group G. By Prop. [24] there are distinct 
characters xi,X2 of G which agree on G n {w a \a £ Q}. 
By Prop. l22l Pq/i — -P(G,Xo/i) ^ two rank-one projectors 
with orthogonal range. By Eq. d77l ). (w a ,Pi) = (w a ,P 2 ) for 

a e o. 

We turn to the second claim. Take Pi = P(G X , x) f° r some 
stabilizer group G^ as in Prop. [23] and some character \- As 
|G| = n, the probability of a randomly chosen element of 
the basis to be contained in G equals 1 jn. As argued before, 
there will be an orthogonal stabilizer projector P2 compatible 
with the coefficients in SI, as soon as the intersection between 
J! and {w a | a £ $1} is smaller than k = log 2 n. Thus the 
probability that (Q} has a unique solution is not larger than 
the probability of an event with probability 1/n occurring at 
least log 2 n times in m = n log 2 n/(l — e) trials. This quantity 
can be bounded by the standard Chernoff-Hoefding inequality 
(e.g. [33], Theorem 2.3. (b)). The advertised bound follows. 



D. Non-Hermitian setting 

We presented the argument in terms of Hermitian matrices 
because this is the natural setting for the Operator-Bernstein 
inequality. It is, however, straight-forward to extend the results 
to arbitrary complex matrices. The construction in this section 
serves as a simple proof of principle; a more refined analysis 
is certainly possible. 

Indeed, assume both p and the {w a } are arbitrary complex 
n x n matrices (in this section, we break with our previous 
convention that any matrix is automatically assumed to be 
Hermitian unless stated otherwise). We will employ a standard 
construction lfT2l . associating with any complex n x n-matrix 
a a Hermitian In x 2?i-matrix 



1 







V2 V ^ 



(57) 



The obvious strategy pursued below consists of the following 
steps: 

(i) from {w a }, build a suitable Hermitian basis in the space 
A42n of In x 2n matrices, 

(ii) formulate a matrix recovery problem in terms of p and 
the basis constructed before, 

(iii) compute the incoherence properties of p with respect to 
that basis, 

(iv) apply the methods detailed in this paper in the extended 
space, and 

(v) show that the original matrix recovery algorithm (i.e. the 
program ([T]) applied to p, {w a } is no more likely to fail 
than the one in the extended space. 

To this end, we start by collecting some basic properties of 
the mapping a n- a. 

Lemma 16. 

1) For fii, ct 2 £ M n : 



(<5-i,<5- 2 ) = Re ((<7i,o- 2 ))- 



2) Let {w a } a be an ortho-normal basis in the complex 
vector space M. n . Then 

{W a }a U {lW a } a 

is an ortho-normal basis in the real vector space of 
Hermitian off-diagonal matrices of the form ( 1571 ). 

3) Let 



= (lk ©(-&)) (59) 



»=i 

be the singular value decomposition of a £ A4 n . The 
2r vectors in IR™ © IR™ of the form 

are the normalized non-zero eigenvectors of a, with 
eigenvalues ^|sj. In particular, 

11*11 = ^11*11, ll*lli = ^IHi> 

and rank a = 2 rank a. 
4) With a as above, set 



1=1 



(the non-Hermitian analogue o/sgner; c.f /3/J. Then 
sgna = V2E(a). 

Proof: Compute: 



(6-1,0-2) 



1 

2 tr 









02 \ 




0) 


(°t 


J 



= -(traicrl +trcr|(T 2 ) = Re ((ci, cr 2 )), 



which implies the first two claims. Verifying statement 3 is 
trivial. 

Let ipf> =ipi®0, 0- 2) = © fa. Let P + be the projection 
onto the positive part of a, let P_ project onto the negative 
part. From (l59l ) it follows that P± equals 

^E^ (1) (^T^ a) OH*^ 23 (^T 

p-=j24 i] (4 i] y-v-2E. 



Thus 

sgaa = P- 



(58) 



We now tackle the first task listed above: building a suitable 

2 

basis in A^ 2n . Denote the original basis {w a }a=i by B. The 
basis B in the extended space is taken to be the set of matrices 

J_ / iw c 

V2 V 

J_ ( iw a 

V2 \ -iw] 

for a = 1, ... ,n. Note that the first two matrices are just uf a 
and iw a , so that Lemma[T6l2 is applicable. 





' 


w, 


y/2 ( 









W a 





71 ( 


v 
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Let Cl be a set of m randomly chosen elements from B. 
Below, we will analyze the problem: 



mm | a 1 1 1 
subject to (a, b a ) = (p, b a ), V6 Q e Q, 



(60) 



where the minimization is over all Hermitian matrices a in 
A^2n- This is step (ii) above. (Note again that we are interested 
in the program d60b only as a means of proving that the original 
program (TJJ works directly for non-Hermitian objects). 

To handle step (iii), we introduce further notations. Let 
U = range p, V — range p^ be the row and column space 
of p respectively. Generalizing our earlier definition to non- 
Hermitian operators (and following [2|), let T be the space 
of matrices with row space contained in U or column space 
contained in V. The projection operator Vt onto T acts as 

By T we mean the set of Hermitian matrices in M.2n with row 
or column space equal to U — range p. Using these notions, 
the following lemma relates the incoherence properties of the 
extended setup to the original objects. 

Lemma 17. 



max \\b a 2 
b a eB 




- max \\w a \\ 2 , 

Z w a EB 


(61) 


max|(6 Q ,sgnp)| 2 
b a eB 


< 


2m a xUw a ,E(p))\ 2 , 
w a eB 


(62) 


max HP-f &a||2 

b a £B 


< 


max \\VTW a \\l. 

w a GB 


(63) 



Proof: The first two claims follow from Lemma Q2] We 
prove the last statement for b a = w a ; the other cases are 
shown analogously. We borrow the notation from the proof of 
Lemma [16] 



Let Pjj ' project onto the span of the ip. 



(i) 



the span of the 



(2) 

and Py ' onto 



From Lemma [16] Pf 7 = P, 



(i) 



(2) 



Let be the projector onto the first direct summand in 
(D 2n = <D" © and P^ onto the second. Then 



(!),r, p( 2 ) 



TV 

• it I'll- 1 ~r -i <JJa* \r ~ * U ^aPy ■ 

From the analogous relation for the adjoint we conclude that 



VfW a = VrWa, 

so that the claim follows from Lemma [16] 1 . ■ 
We proceed to steps (iv) and (v). 

Definition 18 (Coherence, non-Hermitian case). The n x n- 
matrix p has coherence v with respect to a basis {w a } if either 



max w„ 



< 2v- 

n 



or the two estimates 



ma-x\\V T w a \\ 2 

a 

max \ (w a ,E(p))\ 2 



< 
< 



„ r 
2v-, 
n 

1 r 
2 V rf 



(64) 

(65) 
(66) 



Corollary 19. The bounds of Theorem \3\ and Theorem [4] 
continue to hold for non-Hermitian p and {w a }, if the co- 
herence v is measured according to Definition fTS] and n, r 
are substituted by 2n, 2r respectively. 

Proof: The fact that the problem d60i > will have <r* = p as 
its unique solution with the probability of success advertised 
in Theorems [5] E] is an immediate consequence of Lemma [P71 
From Lemma [TBI 3. ||p||i cx \\p\\i, so that the n x n min- 
imization problem (JTJ has p as its unique solution whenever 
the same is true for (l60l and p. ■ 



IV. Conclusion and Outlook 

A. Outlook 

The following topics will be treated in follow-up publica- 
tions. 

1) Noise resilience: As indicated in ifTTI . the procedures 
laid out in this paper are resilient against noise. The analysis 
of noise effects in the general case builds on techniques proved 
in H for the matrix completion problem. It turns out that the 
bounds are quite sensitive to the operator norm of the sampling 
operator 7Z (c.f. Eq. ([T6J). This number is equal to one if the 
expansion coefficients were sampled without replacing, and 
is likely to be of order O(lnn) ^> 1 for the i.i.d. scheme 
presented here. In a future publication, we will prove operator- 
valued large deviation bounds for sampling without replacing 
|27|. Therefore, a detailed discussion of noise effects will be 
deferred until then. 

2) Tight frames: Let p be a normalized measure on the 
unit-sphere of matrices. We refer to p as a tight frame (also 
a spherical 1 -design or a set of matrices in isotropic position 
ll28ll . or just an "overcomplete basis") if 



E„ [V w ] := J V w dp(w) = id, 



(67) 



hold. 



where V w is the orthogonal projection onto w. Tight frames 
can replace ortho-normal bases in many situations. 

In the "Fourier-type" case - i.e. if there is a uniform bound 
on the operator norm ||w|| of the elements of the frame - 
all statements in this paper may be easily translated from 
ortho-normal bases to tight frames. In the absence of such a 
constraint, LemmafTOlmay be a source of problems: it contains 
a union bound over all elements of the frame and is therefore 
sensitive to its size. In particular, it cannot be directly applied 
to continuous frames. We believe that this difficulty can be 
overcome with medium effort and may present more details 
elsewhere. 

Note that similar conclusions have been drawn before in the 
case of commutative compressed sensing J8). 
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VI. Appendix A: proof of Theorem [T2l 

For completeness, we give a short proof of Theorem[T2"l(see 
also ED 1551). 

Proof: We aim to use Theorem QT] with n = 1. To that 
end, let 

Zi = E[JV | X,] - E[N] 

be the Doob martingale sequence of X — E[7V] with respect to 
the Xi; let X a = 1. As in Theorem [m set A: = Zi - Z^x- 
Let X, be the set {Xi, . . . , Xi-i, X+i, . . . , X m } of all 
random variables except for the ith one. Finally, let 

Si =^ X 3 

be the sum of all vectors, with the ith term omitted. 
Using the triangle inequality 

|A| = |E[JV|Xi]-E[AT|X<_i]| 



< sup 



N - ~E[N | Xj] 



< 



\Sib- 
\Xih- 



n\\Xih]) 



(68) 



(69) 



Thus 



| A| < max ||JQ|| 2 + E[||X|| 2 ] < 2max||X i || 2 
where the maximum is over all values of Xi. With 



But 



E[£>? |Xi_i] < su P E[(A - E[A|X,]) 2 |X 4 

E[(A-E[A|X 4 ]) 2 |Xi] 
E[JV 2 |Xi]-E[JV|Xi] 2 



< 



Silll + EOlXH^-EOISi + XillalXi] 2 
ISill? + E[||X|| 2 ] - H5, + E[X]|| 2 = E[||X|| 2 ] =: o\ 



H\\2 



It remains to compute the expectation E[A] < E[A 2 ] X / 2 . 
The square of the latter quantity is 

E[iV 2 ] = ^E[(A l7 ^)]=^E[||X|| 2 ]=y. 



VII. Appendix B: basic theory of stabilizer states 

The lower bound in Section IIII-CI was built around the 
concept of "stabilizer states", a concept from quantum infor- 
mation theory. For the convenience of the reader, we give a 
short outline below. The presentation is necessarily both very 
condensed and fairly technical. A more complete account can 
be found e.g. in Refs. fl3), HH, lfl8ll . 

As a first step, we need to identify a certain group structure 
of the elements of the Pauli basis introduced in Section II-B2I 

Let E 2 be the finite field of order two (with elements {0, 1}), 
and let be the set of column vectors with k entries from 
F 2 . We introduce a mapping w from pairs (p,q) £ (FfjEf) 
to unitary matrices on ((D 2 )® fc by setting 



Proposition 20 (Properties of Pauli operators). With w(p, q) 
as defined in ( 1701 ), it holds that 

1) The w(p,q)'s are Hermitian and unitary. It follows that 

w(p,q) 2 = l. (71) 

2) The Pauli operators form an super-normalized orthogo- 
nal basis: 



tvw{p,q)w(p',q') = 2 k 6 P:P >6 qt , 



(72) 



3) For every p,q,p',q', there is a phase \(p,q,p' ,q') £ 
{±l,±i} such that 

w(p,q)w(p',q') = X(p,q,p',q')w(p+p',q + q'). (73) 

(In other words, the map w realizes a projective repre- 
sentation of the additive group of F 2 x F 2 J 
Ifw(p, q) and w(p' , q') commute, then ( 1731 ) simplifies to 

w(p,q)w(p ,q) = ±w(p + p',q + q). (74) 

4) The commutation relation 

w(p,q)w(p , , q ')^w(p , ,q')w(p,q)(-l)P^-^' (75) 
holds. 

Proof: Equations ( |7T1 |72l |73l |75l l can be checked by 
simple direct computation. 

To verify Eq. d74j >, note that the product of two commut- 
ing Hermitian operators is Hermitian. Thus, if w(p, q) and 
w(p',q') commute, the l.h.s. of d73b is Hermitian. But the 
r.h.s. is Hermitian only if X(p,q,p' ,q') is real. ■ 

By Eq. {73}, the set 

P« = {±w(p,q),±iw(p,q) | (p,q) £ F 2 fe x E 2 } (76) 

forms a matrix group which is known as the Pauli group. 

Certain subgroups of the Pauli group can be used to define 
an interesting class of projection operators. These are called 
stabilizer groups and defined as follows: 

Definition 21. Let G be a subgroup ofV^ k \ The group G is 
called a stabilizer group ;/ 

1) it is Abelian, 

2) -1 g G, and 

3) its order \G\ equals 2 . 

The connection between stabilizer groups and projection 
operators is given in the next proposition. 

Proposition 22. Let G be a stabilizer group. Let x be a 
complex character of G (i.e. x(9d') — xCfiOxG?') f or 9 € G). 
Set 

P(G,X) = ^J2^)9 (77) 



sec 



Then 



>(p,q) = ( 



_ (\Piqi 



of* of). (70) 



ti P(G, X ) = 1 (78) 

P(G,x.) 2 = P(G, X ) (79) 

P(G, X ) f = P(G, X ) (80) 

In particular, P(G, x) !S a rank-one projector. 
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If x' is another complex character of G, then 



tv(P(G, X )P(G,x': 



(81) 



Proof: Equation d78l follows from (1721 ). 
Next, 

P(G,xf = (i) £ X(fefl)^ 



7^) |G|Ex(5)5 = P(G,x) 



sec 



because for h £ G it holds that hG = G (which is true for 
any group). 

From Def. ETJ2 and Eq. (ED, it follows that # 2 = 1 for 
g E G. Hence x(g) 2 = 1 so that x(ff) = ±1- Thus Eq. ([77) is 
a real linear combination of Hermitian operators and therefore 
Hermitian. This proves Eq. (t80t . 

Lastly, 

trP(G, X )P(G,x') = (p) E X{h)jd{g)trhg 

= ^7 E x(g)x'{g) = <W 

g£G 

having used Eq. d72l and the standard orthogonality relation 
for characters of finite groups (see e.g. l37l Corollary 2.14]. 

■ 

Proposition [22] allows us to construct rank-one projection 
operators from stabilizer groups. It remains to be shown that 
such groups actually exist. The construction below makes use 
of the fact that F§ can be identified with the (unique) finite 
field F 2 fc of order 2 fc in the sense that there exists a (non- 
unique) isomorphism from F 2 to F 2 fc which respects the 
additive structure. In this way, we can assign a meaning to 
the product between elements from Fjj. 

Proposition 23. Let b\, . . . , bk be a basis of F§. For each 
x G F§, let G x be the subgroup of J>\ k ) generated by 
{w(bi, xbi), . . . , w(bk, xbk)}- Then G x is a stabilizer group. 
If x' ^ x, then 

G x n G x , = {1}. (82) 



Proof: Since 



bi(xbj) - (xbijb-j = 



the generators commute mutually by Eq. (T75t , Thus, G x is 
Abelian. 

From Eq. ( F74T > it follows that all matrices in G x are of the 
form ±w(p,xp) for p e F§. This proves Eq. (l82l . 

Combining Eq. (l74l with the fact that the 6^'s are a basis, 
it is easy to see that for any p 6 F§ either w(p,xp) or 
—w(p,xp) € G x , but not both. Thus | | = 2 fc and, since 
1 = w(0, xO) e G it must hold that -1 g G. Hence G is a 
stabilizer group and we are done. ■ 

We need one final statement: 

Proposition 24. Any stabilizer group G is isomorphic to the 
additive group o/Ff. Given any I < k elements {g±, . . . , g{\ 



of G, it is possible to find two distinct characters X11X2 of G 
which agree on g\, . . . ,gi. 

Proof: The first claim follows from ( f74b . For the second 
point, note that gi,...,gi span a subspace of F§ of dimension 
at most I < k. Recall that the complex characters of F§ are 
in one-one correspondence with linear functionals F| — > F2. 
There are 2' fc_ ^ distinct ways of extending a given functional 
from an Z-dimensional subspace to all of F 2 - ■ 
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